Beyond Click Graph: Topic Modeling for Search Engine Query Log Analysis
نویسندگان
چکیده
Search engine query log is a valuable information source to analyze the users’ interests and preferences. In existing work, click graph is intensively utilized to analyze the information in query log. However, click graph is usually plagued by low information coverage, failure of capturing the diverse types of co-occurrence and the incapability of discovering the latent semantics in data. In this paper, we go beyond click graph and analyze query log through the new perspective of probabilistic topic modeling. In order to systematically explore the potential assumptions of the latent structure of the log data, we propose three different topic models. The first model, the Meta-word Model (MWM), unifies the co-occurrence of query terms and URLs by the meta-word occurrence. The second model, the Term-URL Model (TUM), captures the characteristics of query terms and URLs separately. The third model, the Clickthrough Model (CTM), captures the clicking behavior explicitly and models the ternary relation between search queries, query terms and URLs. We evaluate the three proposed models against several strong baselines on a real-life query log. The experimental results show that the proposed models demonstrate significantly improved performance with respect to different quantitative metrics and also in applications such as date prediction, community discovery and URL annotation.
منابع مشابه
Discovering Popular Clicks\' Pattern of Teen Users for Query Recommendation
Search engines are still the most important gates for information search in internet. In this regard, providing the best response in the shortest time possible to the user's request is still desired. Normally, search engines are designed for adults and few policies have been employed considering teen users. Teen users are more biased in clicking the results list than are adult users. This leads...
متن کاملQuery Representation with Global Consistency on User Click Graph
Extensive research has been conducted on query log analysis. A query log is generally represented as a bipartite graph on a query set and a URL set. Most of the traditional methods used the raw click frequency to weigh the link between a query and a URL on the click graph. In order to address the disadvantages of raw click frequency, researchers proposed the entropy-biased model, which incorpor...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملExploring Query Auto-Completion and Click Logs for Contextual-Aware Web Search and Query Suggestion
Contextual data plays an important role in modeling search engine users’ behaviors on both query auto-completion (QAC) log and normal query (click) log. User’s recent search history on each log has been widely studied individually as the context to benefit the modeling of users’ behaviors on that log. However, there is no existing work that explores or incorporates both logs together for contex...
متن کاملJigs and Lures: Associating Web Queries with Structured Entities
We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A la...
متن کامل